1 1. (Original) An apparatus comprising: 

2 at least one processor; 

3 a memory coupled to the at least one processor; 

4 a first instruction stream residing in the memory; and 

5 a profile-based loop optimizer residing in the memory and executed by the at least 

6 one processor, the loop optimizer inserting instrumentation code into the first instruction 

7 stream that collects profile data in at least one execution frequency table and thereby 

8 generating a second instruction stream, each execution frequency table indicating values 

9 representative of the number of times a corresponding loop is executed each time the loop 
10 is entered. 



1 2. (Original) The apparatus of claim 1 wherein each execution frequency table 

2 includes a plurality of entries, each entry containing: 

3 (1) a value representative of the number of times a loop is executed each time the 

4 loop is entered; and 

5 (2) a count of the occurrences of each value when the second instruction stream is 

6 executed. 

1 3. (Original) The apparatus of claim 1 wherein the profile-based loop optimizer 

2 uses values in the at least one execution frequency table to peel at least one loop 

3 in the first instruction stream. 

1 4. (Original) The apparatus of claim 1 wherein the profile-based loop optimizer 

2 uses values in the at least one execution frequency table to unroll at least one loop 

3 in the first instruction stream. 

1 5. (Original) The apparatus of claim 1 wherein the profile-based loop optimizer 

2 uses values in the at least one execution frequency table to peel and unroll at least 

3 one loop in the first instruction stream. 
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1 6. (Original) The apparatus of claim 1 wherein the instrumentation code 

2 comprises: 

3 code to allocate a loop iteration counter for a selected loop; 

4 code to allocate the execution frequency table for the selected loop; 

5 code to clear the loop iteration counter on all entry paths to the selected loop; 

6 code to increment the loop iteration counter in a header block for the selected 

7 loop; and 

8 code to read the loop iteration counter and update the execution frequency table 

9 along all exit paths from the selected loop. 
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1 7. (Original) An apparatus comprising: 

2 at least one processor; 

3 a memory coupled to the at least one processor; 

4 a first instruction stream residing in the memory; and 

5 a profile-based loop optimizer residing in the memory and executed by the at least 

6 one processor, the loop optimizer optimizing at least one loop in the first instruction 

7 stream according to profile data stored in at least one execution frequency table, each 

8 execution frequency table indicating values representative of the number of times a 

9 corresponding loop is executed each time the loop is entered. 

1 8. (Original) The apparatus of claim 7 wherein each execution frequency table 

2 includes a plurality of entries, each entry containing: 

3 (1) a value representative of the number of times a loop is executed each time the 

4 loop is entered; and 

5 (2) a count of the occurrences of each value when the first instruction stream is 

6 executed. 

1 9. (Original) The apparatus of claim 7 wherein the profile-based loop optimizer 

2 uses values in the at least one execution frequency table to peel at least one loop 

3 in the first instruction stream. 
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1 10. (Currently amended) The apparatus of claim 9 wherein the profile-based loop 

2 optimizer peels a selected loop if one of the following conditions are true: 

3 (A) the execution frequency table for the selected loop has a dominant mode that 

4 is less than a specified peeling threshold; 

5 (B) the execution frequency table for the selected loop does not have a dominant 

6 mode, and [most of the execution frequencies] a majority of the values in the execution 

7 frequency table are smaller than the specified peeling threshold. 

1 11. (Currently amended) The apparatus of claim 1 0 wherein the profile-based loop 

2 optimizer unrolls the selected loop if: 

3 neither (A) nor (B) are true; and 

4 [most of the execution frequencies] a majority of the values in the execution 

5 frequency table for the selected loop are greater than a specified unrolling threshold. 

1 12. (Original) The apparatus of claim 7 wherein the profile-based loop optimizer 

2 uses values in the at least one execution frequency table to unroll at least one loop 

3 in the first instruction stream. 

1 13. (Original) The apparatus of claim 7 wherein the profile-based loop optimizer 

2 uses values in the at least one execution frequency table to peel and unroll at least 

3 one loop in the first instruction stream. 

1 1 4. (Original) The apparatus of claim 7 wherein the profile-based loop optimizer 

2 determines whether to peel or unroll a loop based on a dominant mode, if present, 

3 in the execution frequency table corresponding to the loop. 
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1 15. (Original) An apparatus comprising: 

2 (A) at least one processor; 

3 (B) a memory coupled to the at least one processor; 

4 (C) a first instruction stream residing in the memory; and 

5 (D) a profile-based loop optimizer residing in the memory and executed by the at 

6 least one processor, the loop optimizer inserting instrumentation code into the first 

7 instruction stream that collects profile data in at least one execution frequency table and 

8 thereby generating a second instruction stream, wherein each execution frequency table 

9 includes a plurality of entries, each entry containing: 

10 (Dl) a value representative of the number of times a loop is executed each 

1 1 time the loop is entered; and 

12 (D2) a count of the occurrences of each value when the second instruction 

13 stream is executed; 

14 (E) wherein the instrumentation code comprises: 

15 (El) code to allocate a loop iteration counter for a selected loop; 

16 (E2) code to allocate the execution frequency table for the selected loop; 

1 7 (E3) code to clear the loop iteration counter on all entry paths to the 

18 selected loop; 

19 (E4) code to increment the loop iteration counter in a header block for the 

20 selected loop; and 

21 (E5) code to read the loop iteration counter and update the execution 

22 frequency table along all exit paths from the selected loop; 

23 (F) the loop optimizer optimizing a loop in the first instruction stream according 

24 to profile data stored in the at least one execution frequency table by peeling the loop, 

25 unrolling the loop, or both peeling and unrolling the loop based on profile data stored in 

26 the execution frequency table corresponding to the loop. 
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1 16. (Currently amended) The apparatus of claim 1 5 wherein the profile-based loop 

2 optimizer peels a selected loop if one of the following conditions are true: 

3 (A) the execution frequency table for the selected loop has a dominant mode that 

4 is less than a specified peeling threshold; 

5 (B) the execution frequency table for the selected loop does not have a dominant 

6 mode, and [most of the execution frequencies] a majority of the values in the execution 

7 frequency table are smaller than the specified peeling threshold. 



1 17. (Currently amended) The apparatus of claim 1 6 wherein the profile-based loop 

2 optimizer unrolls the selected loop if: 

3 neither (A) nor (B) are true; and 

4 [most of the execution frequencies] a majority of the values in the execution 

5 frequency table for the selected loop are greater than a specified unrolling threshold. 
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1 18. (Original) A method for instrumenting a first instruction stream comprising 

2 the steps of: 

3 inserting code into the first instruction stream for a selected loop that defines at 

4 least one execution frequency table for the selected loop, each execution frequency table 

5 indicating values representative of the number of times the selected loop is executed each 

6 time the selected loop is entered; 

7 inserting code into the first instruction stream that updates the execution 

8 frequency table according to the number of times the selected loop is executed each time 

9 the loop is entered. 



1 1 9. (Original) A method for instrumenting a first instruction stream comprising 

2 the steps of: 

3 inserting code that allocates a loop iteration counter for a selected loop in the first 

4 instruction stream; 

5 inserting code that allocates an execution frequency table that corresponds to the 

6 selected loop; and 

7 inserting code to clear the loop iteration counter on all entry paths to the selected 

8 loop; 

9 inserting code to increment the loop iteration counter in a header block for the 

10 selected loop; and 

1 1 inserting code to read the loop iteration counter and update the execution 

12 frequency table along all exit paths from the selected loop according to the number of 

13 times the selected loop is executed each time the loop is entered. 
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1 20. (Original) A method for optimizing at least one loop in a first instruction 

2 stream, the method comprising the steps of: 

3 inserting instrumentation code into the first instruction stream that collects profile 

4 data in at least one execution frequency table and thereby generating a second instruction 

5 stream, each execution frequency table indicating values representative of the number of 

6 times a corresponding loop is executed each time the loop is entered; and 

7 optimizing at least one loop in the first instruction stream according to profile data 

8 stored in at least one execution frequency table. 

1 21. (Original) The method of claim 20 wherein each execution frequency table 

2 includes a plurality of entries, each entry containing: 

3 (1) a value representative of the number of times a loop is executed each time the 

4 loop is entered; and 

5 (2) a count of the occurrences of each value when the first instruction stream is 

6 executed. 

1 22. (Original) The method of claim 20 further comprising the step of using values 

2 in the at least one execution frequency table to peel at least one loop in the first 

3 instruction stream. 

1 23. (Currently amended) The method of claim 20 wherein the step of using values 

2 in the at least one execution frequency table to peel at least one loop in the first 

3 instruction stream peels a selected loop if one of the following conditions are true: 

4 (A) the execution frequency table for the selected loop has a dominant mode that 

5 is less than a specified peeling threshold; 

6 (B) the execution frequency table for the selected loop does not have a dominant 

7 mode, and [most of the execution frequencies] a majority of the values in the execution 

8 frequency table are smaller than the specified peeling threshold. 
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1 24. (Currently amended) The method of claim 23 further comprising the step of 

2 unrolling the selected loop if: 

3 neither (A) nor (B) are true; and 

4 [most of the execution frequencies] a majority of the values in the execution 

5 frequency table for the selected loop are greater than a specified unrolling threshold. 

1 25. (Original) The method of claim 20 further comprising the step of using values 

2 in the at least one execution frequency table to unroll at least one loop in the first 

3 instruction stream. 

1 26. (Original) The method of claim 20 further comprising the step of using values 

2 in the at least one execution frequency table to peel and unroll at least one loop in 

3 the first instruction stream. 

1 27. (Original) The method of claim 20 further comprising the step of determining 

2 whether to peel or unroll a loop based on a dominant mode, if present, in the 

3 execution frequency table corresponding to the loop. 
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1 28. (Original) A method for optimizing a plurality of loops in a first instruction 

2 stream, the method comprising the steps of: 

3 (A) inserting code that allocates a loop iteration counter for at least one loop in the 

4 first instruction stream; 

5 (B) inserting code that allocates an execution frequency table that corresponds to a 

6 loop in the first instruction stream, wherein each execution frequency table includes a 

7 plurality of entries, each entry containing: 

8 (1) a value representative of the number of times a loop is executed each 

9 time the loop is entered; and 

10 (2) a count of the occurrences of each value; 

1 1 (C) inserting code to clear the loop iteration counter on all entry paths to the 

12 selected loop; 

13 (D) inserting code to increment the loop iteration counter in a header block for the 

14 selected loop; and 

15 (E) inserting code to read the loop iteration counter and update the execution 

16 frequency table along all exit paths from the selected loop according to the number of 

17 times the selected loop is executed each time the loop is entered; 

1 8 (F) the inserting code in steps (A) through (E) generating a second instruction 

19 stream; 

20 (G) executing the second instruction stream with sample inputs to collect profile 

21 data in the at least one execution frequency table; 

22 (H) using values in the at least one execution frequency table to peel at least one 

23 loop in the first instruction stream based on profile data stored in the execution frequency 

24 table corresponding to the loop; and 
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(claim 28 continued) 

25 (I) using values in the at least one execution frequency table to unroll at least one 

26 loop in the first instruction stream based profile data stored in the execution frequency 

27 table corresponding to the loop. 

1 29. (Currently amended) The method of claim 28 wherein the step of using values 

2 in the at least one execution frequency table to peel at least one loop in the first 

3 instruction stream peels a selected loop if one of the following conditions are true: 

4 (A) the execution frequency table for the selected loop has a dominant mode that 

5 is less than a specified peeling threshold; 

6 (B) the execution frequency table for the selected loop does not have a dominant 

7 mode, and [most of the execution frequencies] a majority of the values in the execution 

8 frequency table are smaller than the specified peeling threshold. 

1 30. (Currently amended) The method of claim 29 wherein the step of using values 

2 in the at least one execution frequency table to unroll at least one loop in the first 

3 instruction stream unrolls the selected loop if: 

4 neither (A) nor (B) are true; and 

5 [most of the execution frequencies] a majority of the values in the execution 

6 frequency table for the selected loop are greater than a specified unrolling threshold. 
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1 31. (Original) A program product comprising: 

2 (A) a profile-based loop optimizer that inserts instrumentation code into a first 

3 instruction stream that collects profile data in at least one execution frequency table and 

4 thereby generates a second instruction stream, each execution frequency table indicating 

5 values representative of the number of times a corresponding loop is executed each time 

6 the loop is entered; and 

7 (B) computer-readable signal bearing media bearing the profile-based loop 

8 optimizer. 

1 32. (Original) The program product of claim 3 1 wherein the computer-readable 

2 signal bearing media comprises recordable media. 

1 33. (Original) The program product of claim 3 1 wherein the computer-readable 

2 signal bearing media comprises transmission media. 

1 34. (Original) The program product of claim 3 1 wherein each execution 

2 frequency table includes a plurality of entries, each entry containing: 

3 (1) a value representative of the number of times a loop is executed each time the 

4 loop is entered; and 

5 (2) a count of the occurrences of each value when the second instruction stream is 

6 executed. 

1 35. (Original) The program product of claim 31 wherein the profile-based loop 

2 optimizer uses values in the at least one execution frequency table to peel at least 

3 one loop in the first instruction stream. 

1 36. (Original) The program product of claim 3 1 wherein the profile-based loop 

2 optimizer uses values in the at least one execution frequency table to unroll at 

3 least one loop in the first instruction stream. 
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37. (Original) The program product of claim 3 1 wherein the profile-based loop 
optimizer uses values in the at least one execution frequency table to peel and 
unroll at least one loop in the first instruction stream. 

38. (Original) The program product of claim 3 1 wherein the instrumentation code 
comprises: 

code to allocate a loop iteration counter for a selected loop; 
code to allocate the execution frequency table for the selected loop; 
code to clear the loop iteration counter on all entry paths to the selected loop; 
code to increment the loop iteration counter in a header block for the selected 
loop; and 

code to read the loop iteration counter and update the execution frequency table 
along all exit paths from the selected loop. 
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1 39. (Original) A program product comprising: 

2 (A) a profile-based loop optimizer that optimizes at least one loop in a first 

3 instruction stream according to profile data stored in at least one execution frequency 

4 table, each execution frequency table indicating values representative of the number of 

5 times a corresponding loop is executed each time the loop is entered; and 

6 (B) computer-readable signal bearing media bearing the profile-based loop 

7 optimizer. 

1 40. (Original) The program product of claim 39 wherein the computer-readable 

2 signal bearing media comprises recordable media. 

1 41. (Original) The program product of claim 39 wherein the computer-readable 

2 signal bearing media comprises transmission media. 

1 42. (Original) The program product of claim 39 wherein each execution 

2 frequency table includes a plurality of entries, each entry containing: 

3 (1) a value representative of the number of times a loop is executed each time the 

4 loop is entered; and 

5 (2) a count of the occurrences of each value when the first instruction stream is 

6 executed. 

1 43 . (Original) The program product of claim 39 wherein the profile-based loop 

2 optimizer uses values in the at least one execution frequency table to peel at least 

3 one loop in the first instruction stream. 
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1 44. (Currently amended) The program product of claim 39 wherein the profile- 

2 based loop optimizer peels a selected loop if one of the following conditions are 

3 true: 

4 (A) the execution frequency table for the selected loop has a dominant mode that 

5 is less than a specified peeling threshold; 

6 (B) the execution frequency table for the selected loop does not have a dominant 

7 mode, and [most of the execution frequencies] a majority of the values in the execution 

8 frequency table are smaller than the specified peeling threshold. 

1 45. (Currently amended) The program product of claim 44 wherein the profile- 

2 based loop optimizer unrolls the selected loop if: 

3 neither (A) nor (B) are true; and 

4 [most of the execution frequencies] a majority of the values in the execution 

5 frequency table for the selected loop are greater than a specified unrolling threshold. 

1 46. (Original) The program product of claim 39 wherein the profile-based loop 

2 optimizer uses values in the at least one execution frequency table to unroll at 

3 least one loop in the first instruction stream. 

1 47. (Original) The program product of claim 39 wherein the profile-based loop 

2 optimizer uses values in the at least one execution frequency table to peel and 

3 unroll at least one loop in the first instruction stream. 

1 48. (Original) The program product of claim 39 wherein the profile-based loop 

2 optimizer determines whether to peel or unroll a loop based on a dominant mode, 

3 if present, in the execution frequency table corresponding to the loop. 
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1 49. (Original) A program product comprising: 

2 (A) a profile-based loop optimizer that inserts instrumentation code into a first 

3 instruction stream that collects profile data in at least one execution frequency table and 

4 thereby generating a second instruction stream, wherein each execution frequency table 

5 includes a plurality of entries, each entry containing: 

6 a value representative of the number of times a loop is executed each time 

7 the loop is entered; and 

8 a count of the occurrences of each value when the second instruction 

9 stream is executed; 

10 wherein the instrumentation code comprises: 

1 1 code to allocate a loop iteration counter for a selected loop; 

12 code to allocate the execution frequency table for the selected loop; 

13 code to clear the loop iteration counter on all entry paths to the selected 

14 loop; 

1 5 code to increment the loop iteration counter in a header block for the 

16 selected loop; and 

17 code to read the loop iteration counter and update the execution frequency 

18 table along all exit paths from the selected loop; 

19 the loop optimizer optimizing a loop in the first instruction stream according to 

20 profile data stored in the at least one execution frequency table by peeling the loop, 

2 1 unrolling the loop, or both peeling and unrolling the loop based on profile data stored in 

22 the execution frequency table corresponding to the loop; and 

23 (B) computer-readable signal bearing media bearing the profile-based loop 

24 optimizer. 

1 50. (Original) The program product of claim 49 wherein the computer-readable 

2 signal bearing media comprises recordable media. 
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1 51. (Original) The program product of claim 49 wherein the computer-readable 

2 signal bearing media comprises transmission media. 

1 52. (Currently amended) The program product of claim 49 wherein the profile- 

2 based loop optimizer peels a selected loop if one of the following conditions are 

3 true: 

4 (A) the execution frequency table for the selected loop has a dominant mode that 

5 is less than a specified peeling threshold; 

6 (B) the execution frequency table for the selected loop does not have a dominant 

7 mode, and [most of the execution frequencies] a majority of the values in the execution 

8 frequency table are smaller than the specified peeling threshold. 

1 53 . (Currently amended) The program product of claim 52 wherein the profile- 

2 based loop optimizer unrolls the selected loop if: 

3 neither (A) nor (B) are true; and 

4 [most of the execution frequencies] a majority of the values in the execution 

5 frequency table for the selected loop are greater than a specified unrolling threshold. 
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